141 research outputs found

    spa: Semi-Supervised Semi-Parametric Graph-Based Estimation in R

    Get PDF
    In this paper, we present an R package that combines feature-based (X) data and graph-based (G) data for prediction of the response Y . In this particular case, Y is observed for a subset of the observations (labeled) and missing for the remainder (unlabeled). We examine an approach for fitting Y = Xò + f(G) where ò is a coefficient vector and f is a function over the vertices of the graph. The procedure is semi-supervised in nature (trained on the labeled and unlabeled sets), requiring iterative algorithms for fitting this estimate. The package provides several key functions for fitting and evaluating an estimator of this type. The package is illustrated on a text analysis data set, where the observations are text documents (papers), the response is the category of paper (either applied or theoretical statistics), the X information is the name of the journal in which the paper resides, and the graph is a co-citation network, with each vertex an observation and each edge the number of times that the two papers cite a common paper. An application involving classification of protein location using a protein interaction graph and an application involving classification on a manifold with part of the feature data converted to a graph are also presented.

    On multi-view learning with additive models

    Get PDF
    In many scientific settings data can be naturally partitioned into variable groupings called views. Common examples include environmental (1st view) and genetic information (2nd view) in ecological applications, chemical (1st view) and biological (2nd view) data in drug discovery. Multi-view data also occur in text analysis and proteomics applications where one view consists of a graph with observations as the vertices and a weighted measure of pairwise similarity between observations as the edges. Further, in several of these applications the observations can be partitioned into two sets, one where the response is observed (labeled) and the other where the response is not (unlabeled). The problem for simultaneously addressing viewed data and incorporating unlabeled observations in training is referred to as multi-view transductive learning. In this work we introduce and study a comprehensive generalized fixed point additive modeling framework for multi-view transductive learning, where any view is represented by a linear smoother. The problem of view selection is discussed using a generalized Akaike Information Criterion, which provides an approach for testing the contribution of each view. An efficient implementation is provided for fitting these models with both backfitting and local-scoring type algorithms adjusted to semi-supervised graph-based learning. The proposed technique is assessed on both synthetic and real data sets and is shown to be competitive to state-of-the-art co-training and graph-based techniques.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS202 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    spa: Semi-Supervised Semi-Parametric Graph-Based Estimation in R

    Get PDF
    In this paper, we present an R package that combines feature-based (X) data and graph-based (G) data for prediction of the response Y . In this particular case, Y is observed for a subset of the observations (labeled) and missing for the remainder (unlabeled). We examine an approach for fitting Y = X? + f(G) where ? is a coefficient vector and f is a function over the vertices of the graph. The procedure is semi-supervised in nature (trained on the labeled and unlabeled sets), requiring iterative algorithms for fitting this estimate. The package provides several key functions for fitting and evaluating an estimator of this type. The package is illustrated on a text analysis data set, where the observations are text documents (papers), the response is the category of paper (either applied or theoretical statistics), the X information is the name of the journal in which the paper resides, and the graph is a co-citation network, with each vertex an observation and each edge the number of times that the two papers cite a common paper. An application involving classification of protein location using a protein interaction graph and an application involving classification on a manifold with part of the feature data converted to a graph are also presented

    ada: An R Package for Stochastic Boosting

    Get PDF
    Boosting is an iterative algorithm that combines simple classification rules with "mediocre" performance in terms of misclassification error rate to produce a highly accurate classification rule. Stochastic gradient boosting provides an enhancement which incorporates a random mechanism at each boosting step showing an improvement in performance and speed in generating the ensemble. ada is an R package that implements three popular variants of boosting, together with a version of stochastic gradient boosting. In addition, useful plots for data analytic purposes are provided along with an extension to the multi-class case. The algorithms are illustrated with synthetic and real data sets.

    Predicting whole genome protein interaction networks from primary sequence data in model and non-model organisms using ENTS

    Get PDF
    Background The large-scale identification of physical protein-protein interactions (PPIs) is an important step toward understanding how biological networks evolve and generate emergent phenotypes. However, experimental identification of PPIs is a laborious and error-prone process, and current methods of PPI prediction tend to be highly conservative or require large amounts of functional data that may not be available for newly-sequenced organisms. Results In this study we demonstrate a random-forest based technique, ENTS, for the computational prediction of protein-protein interactions based only on primary sequence data. Our approach is able to efficiently predict interactions on a whole-genome scale for any eukaryotic organism, using pairwise combinations of conserved domains and predicted subcellular localization of proteins as input features. We present the first predicted interactome for the forest tree Populus trichocarpa in addition to the predicted interactomes for Saccharomyces cerevisiae, Homo sapiens, Mus musculus, and Arabidopsis thaliana. Comparing our approach to other PPI predictors, we find that ENTS performs comparably to or better than a number of existing approaches, including several that utilize a variety of functional information for their predictions. We also find that the predicted interactions are biologically meaningful, as indicated by similarity in functional annotations and enrichment of co-expressed genes in public microarray datasets. Furthermore, we demonstrate some of the biological insights that can be gained from these predicted interaction networks. We show that the predicted interactions yield informative groupings of P. trichocarpa metabolic pathways, literature-supported associations among human disease states, and theory-supported insight into the evolutionary dynamics of duplicated genes in paleopolyploid plants. Conclusion We conclude that the ENTS classifier will be a valuable tool for the de novoannotation of genome sequences, providing initial clues about regulatory and metabolic network topology, and revealing relationships that are not immediately obvious from traditional homology-based annotations

    Enohpoxas Tetrauq

    Get PDF
    Cook Hall 212 Thursday Evening April 25, 2002 7:00p.m

    Mitochondrial ATP fuels ABC transporter-mediated drug efflux in cancer chemoresistance

    Get PDF
    Chemotherapy remains the standard of care for most cancers worldwide, however development of chemoresistance due to the presence of the drug-effluxing ATP binding cassette (ABC) transporters remains a significant problem. The development of safe and effective means to overcome chemoresistance is critical for achieving durable remissions in many cancer patients. We have investigated the energetic demands of ABC transporters in the context of the metabolic adaptations of chemoresistant cancer cells. Here we show that ABC transporters use mitochondrial-derived ATP as a source of energy to efflux drugs out of cancer cells. We further demonstrate that the loss of methylation-controlled J protein (MCJ) (also named DnaJC15), an endogenous negative regulator of mitochondrial respiration, in chemoresistant cancer cells boosts their ability to produce ATP from mitochondria and fuel ABC transporters. We have developed MCJ mimetics that can attenuate mitochondrial respiration and safely overcome chemoresistance in vitro and in vivo. Administration of MCJ mimetics in combination with standard chemotherapeutic drugs could therefore become an alternative strategy for treatment of multiple cancers

    Early parenting intervention aimed at maternal sensitivity and discipline: A process evaluation

    Get PDF
    This study investigated the influence of the intervention process on the effectiveness of a program aimed at promoting positive parenting. The study involved a homogeneous intervention sample (N = 120) of mothers and their 1-, 2-, or 3-year-old children screened for high levels of externalizing problems. The alliance between mother and intervener, mothers' active skills implementation, and father involvement were examined in relation to changes in maternal sensitivity and positive discipline strategies. Results revealed that only alliance predicted change in positive parenting. Implications for future process evaluations and intervention programs are discussed. © 2008 Wiley Periodicals, Inc
    corecore